Multi-classification of Patent Applications with Winnow
نویسندگان
چکیده
The Winnow family of learning algorithms can cope well with large numbers of features and is tolerant to variations in document length, which makes it suitable for classifying large collections of large documents, like patent applications. Both the large size of the documents and the large number of available training documents for each class make this classification task qualitatively different from the classification of short documents (newspaper articles or medical abstracts) with few training examples, as exemplified by the TREC evaluations. This note describes recent experiments with Winnow on two large corpora of patent applications, supplied by the European Patent Office (EPO). It is found that the multi-classification of patent applications is much less accurate than the mono-classification of similar documents. We describe a potential pitfall in multi-classification and show ways to improve the accuracy. We argue that the inherently larger noisiness of multi-class labeling is the reason that multi-classification is harder than mono-classification.
منابع مشابه
Text Categorization for Intellectual Property Comparing Balanced Winnow with SVM on Different Document Representations
This study investigates the effect of training different categorization algorithms on various patent document representations. The automation of knowledge and content management in the intellectual property domain has been experiencing a growing interest in the last decade, since the first patent classification system was presented in 1999 by Larkey [Larkey, 1999]. Typical applications of paten...
متن کاملComparative Analysis of Balanced Winnow and SVM in Large Scale Patent Categorization
This study investigates the effect of training different categorization algorithms on a corpus that is significantly larger than those reported in experiments in the literature. By means of machine learning techniques, a collection of 1.2 million patent applications is used to build a classifier that is able to classify documents with varyingly large feature spaces into the International Classi...
متن کاملmyClass: A Mature Tool for Patent Classification
In this task 2,000 patents in three languages (English, French and German) were to be classified among approximately 600 categories. We used a classifier based on neural networks of the Winnow type. This classifier is already used for similar tasks in professional applications. We tested three different approaches to improve the classification accuracy: the first one aimed at solving the issue ...
متن کاملHand Gestures Classification with Multi-Core DTW
Classifications of several gesture types are very helpful in several applications. This paper tries to address fast classifications of hand gestures using DTW over multi-core simple processors. We presented a methodology to distribute templates over multi-cores and then allow parallel execution of the classification. The results were presented to voting algorithm in which the majority vote was ...
متن کاملText Chunking based on a Generalization of Winnow
This paper describes a text chunking system based on a generalization of the Winnow algorithm. We propose a general statistical model for text chunking which we then convert into a classification problem. We argue that the Winnow family of algorithms is particularly suitable for solving classification problems arising from NLP applications, due to their robustness to irrelevant features. Howeve...
متن کامل